Exploring Factors that Contribute to Country Development

Intro

This is an R Markdown blog template. This document will be knit to HTML to produce a webpage that will be hosted publicly via GitHub.

Website publication work flow

  1. Edit Rmd

  2. Knit to HTML to view progress. You may need to click “Open in Browser” for some content to show (sometimes content won’t show until you actually push your changes to GitHub and view the published website).

  3. Commit and push changes when you are ready. The website may take a couple minutes to update automatically after the push, but you may need to clear your browser’s cache or view the page in a private/incognito window to see the changes more quickly.

You can include text, code, and output as usual. Remember to take full advantage of Markdown and follow our Style Guide.

Examples and additional guidance are provided below.

Take note of the the default code chunk options in the setup code chunk. For example, unlike the rest of the Rmd files we worked in this semester, the default code chunk option is echo = FALSE, so you will need to set echo = TRUE for any code chunks you would like to display in the blog. You should be thoughtful and intentional about the code you choose to display.

Education and the World: Literacy rates, Human Development Index, and their relationship

You can include links using Markdown syntax as shown.

You should include links to relevant sites as you write. You should additionally include a list of references as the end of your blog with full citations (and relevant links).

## `geom_smooth()` using formula = 'y ~ x'

Human Development Index (HDI) and Education

In the world map below, countries are colored according to their Human Development Index score. Each country is assigned an HDI score - a number between 0 and 1, designed, in a rough sense, to measure quality of life. Notice that countries further from the equator are more likely to have a high HDI score than countries closer to the equator. This trend shows up as a visual gradient on the map: the further from the equator, the higher the HDI score, the more blue the countries appear. But this is not a general rule. The term “Gloabl South” is often used to describe a collection of so-called “under-developed” countries near the equator and south of it, a collection which the map below suggests.

However, this map is quite one dimensional. Just what exactly does HDI tell us? What, in concrete terms, does “human development” mean? The goal of the following analysis is to shed light on HDI and its limitations through other measures, in particular measures related to literacy rates and population density.

Literacy Rate and HDI

We begin our inquiry into HDI and education by asking: Which is a better predictor of literacy rates - HDI, or average number of years of education? Moreover, what does it mean if HDI predicts literacy rates better than average number of years of education?

For each country, we can find an expected number of years of schooling; this is the number of years the average student attends school. In countries where the average years of schooling is higher, we expect to find higher average literacy rates.

For each continent, we calculated two correlation coefficients. First, we found the correlation between HDI score and literacy rate; in other words, how well does HDI predict literacy rate for that continent. Second, we found the correlation between average years of education and literacy rate; in other words, how well does years of schooling predict literacy rate for that continent.

Next, for each continent, we found the difference between these two correlations. The interesting results are those where this difference is small. A small difference in these two values means that “development” is as good a predictor of literacy rates as years of education. A small difference indicates that non-educational “developmental” factors are influencing literacy rates.

## Warning in left_join(., Education_Literacy, Education_Literacy_cor, by = c(region = "region")): Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 1 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.
## Warning in left_join(., HDI_Literacy, HDI_Literacy_cor, by = c(region = "region")): Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 1 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.

Observe that two continents, South America and Africa, are picked out as having a smaller difference. This means that in these two continents, extra educational factors are influencing literacy rates. This observation tracks with the delineation into “Global South” and “Global North” indicated by the plot of HDI. That is, the literacy rates of South America and Africa, continents situated in the Global South, suffer from extra-educational factors.

One problem with this analysis is that it is not granular. It gives us a view of the world that is split into seven, when in reality, the world has far more than seven borders.

Our next analysis clusters countries according to literacy rate and population density. The goal of the analysis is to show that the division into Global North and Global South is inadequate to understand differences in literacy rates. In other words, the delineation into North and South indicated by HDI is a simplification - the actual situation is more complicated.

Before this analysis can proceed, we first make an observation about the relationship between population density and literacy rates. Compare the plots of Population Density vs. Literacy Rate, and Log of Population Density vs. Literacy Rate. Observe that a line of best fit on the first plot would be exponential, while in the second, a line of best fit would be linear. This suggests that for the purposes of clustering, it would be appropriate to cluster Log of Population Density against Literacy Rate.

The elbow plot shows that a cluster analysis using three clusters is most appropriate. The plot below associates each country with one of three clusters. The first cluster, 1, consists of countries with high literacy rate and low population density. The second cluster, 2, consists of countries with high literacy rate and high density. The third cluster, 3, consists of countries with low literacy rate. Notice that this third cluster ranges over a wide variety of population densities.

## # A tibble: 3 × 5
##   latestRate_scaled density_scaled  size withinss cluster
##               <dbl>          <dbl> <int>    <dbl> <fct>  
## 1             0.412         -1.07     47     25.9 1      
## 2            -1.77          -0.150    33     42.1 2      
## 3             0.432          0.612    90     54.4 3

The following map colors each country according to its cluster assignment. What is interesting about this map is that it shows how groups of contiguous countries are likely to fall into the same cluster. What does this mean? As an example, examine the pair of North African countries, Algeria and Libya. These two countries are near the equator, and in our previous analysis, were part of the group described as the Global South.

Whereas HDI assigns a bare number to each country, the map below expresses relationships between a country’s position, its population density, and its literacy rate. Notice that there are pockets of contries from the same cluster, that is, contries from a given cluster tend to be surrounded by others from the same cluster.

Algeria and Libya are countries within the same cluster, number 1, high literacy rate and low population density. Notice that these two contiguous countries share a common situation with respect to literacy. And Algeria and Libya are just an example. Throughout the map below, groups of contiguous countries tend to fall within the same cluster. This indicates: the assignment of a country to a particular cluster does not depend only on the circumstances within that country; it depends also on the broader regional context in which a country is situated. Even though, in certain places, HDI is as good a predictor of literacy rate as years of education, this analysis obscures the fact that regional factors are at play. It simply is not the case that a single number - weather HDI or “Expected Years of Education” - can capture the whole situation regarding the literacy of a country. The reason for this is made clear by the map above: the situation regarding literacy in one country does not depend only on that country. Thus, numbers examining countries in isolation are incapable of expressing the sitation. The map above, in which pockets of similar countries emarge grouped together geographically, testifies to the reality of this interrelationship. HDI is an excellent tool for examining a country in its isolation. But comprehending the literacy situation in a given country, as the map above demonstrates, requires looking beyond the borders of that country.

Visualizations

Visualizations, particularly interactive ones, will be well-received. That said, do not overuse visualizations. You may be better off with one complicated but well-crafted visualization as opposed to many quick-and-dirty plots. Any plots should be well-thought-out, properly labeled, informative, and visually appealing.

If you want to include dynamic visualizations or tables, you should explore your options from packages that are built from htmlwidgets. These htmlwidgets-based packages offer ways to build lighterweight, dynamic visualizations or tables that don’t require an R server to run! A more complete list of packages is available on the linked website, but a short list includes:

  • plotly: Interactive graphics with D3
  • leaflet: Interactive maps with OpenStreetMap
  • dygraphs: Interactive time series visualization
  • visNetwork: Network graph visualization vis.js
  • sparkline: Small inline charts
  • threejs: Interactive 3D graphics

You may embed a published Shiny app in your blog if useful, but be aware that there is a limited window size for embedded objects, which tends to makes the user experience of the app worse relative to a dedicated Shiny app page. Additionally, Shiny apps will go idle after a few minutes and have to be reloaded by the user, which may also affect the user experience.

Any Shiny apps embedded in your blog should be accompanied by the link to the published Shiny app (I did this using a figure caption in the code chunk below, but you don’t have to incorporate the link in this way).

Tables

DT package

The DT package is great for making dynamic tables that can be displayed, searched, and filtered by the user without needing an R server or Shiny app!

Note: you should load any packages you use in the setup code chunk as usual. The library() functions are shown below just for demonstration.

library(DT)
mtcars %>% 
  select(mpg, cyl, hp) %>% 
  datatable(colnames = c("MPG", "Number of cylinders", "Horsepower"),
            filter = 'top',
            options = list(pageLength = 10, autoWidth = TRUE))

kableExtra package

You can also use kableExtra for customizing HTML tables.

library(kableExtra)
summary(cars) %>%
  kbl(col.names = c("Speed", "Distance"),
      row.names = FALSE) %>%
  kable_styling(bootstrap_options = "striped",
                full_width = FALSE) %>%
  row_spec(0, bold = TRUE) %>%
  column_spec(1:2, width = "1.5in") 
Speed Distance
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00

Images

Images and gifs can be displayed using code chunks:

"Safe Space" by artist Kenesha Sneed

“Safe Space” by artist Kenesha Sneed

This is a figure caption

This is a figure caption

You may also use Markdown syntax for displaying images as shown below, but code chunks offer easier customization of the image size and alignment.

This is another figure caption

Either way, the file path can be a local path within your project directory or a URL for an image hosted online. This syntax works for PNG, PDF, JPG, and even GIF formats.

Videos

You can use code chunks or Markdown syntax include links to any valid YouTube or Vimeo URLs (see here for details) or point to a location within your project directory.

Code chunk:

Markdown syntax:

You may need to push your updates to GitHub to see if the videos work.

Equations

You might include equations if part of the purpose of your blog is to explain a statistical method. There are two ways to include equations:

  • Inline: \(b \sim N(0, \sigma^2_b)\)
  • Display-style (displayed on its own line): \[\frac{\sigma^2_b}{\sigma^2_b + \sigma^2_e}\]

For typesetting equations appropriately, check out the AMS-LaTeX quick reference or take a look at the Symbols in math mode section of this cheat sheet (or do some extra Googling—there are many resources).

Formatting

Tabbed subsections

Each subsection below the “Tabbed subsections” section heading will appear in a tab. See R Markdown Cookbook Section 7.6: Put content in tabs for additional customization options.

Bulleted list

You can make a bulleted list like this:

  • item 1
  • item 2
  • item 3

Numbered list

You can make a numbered list like this

  1. First thing I want to say
  2. Second thing I want to say
  3. Third thing I want to say

Column formatting

Content Column 1

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse vel ipsum eu sem facilisis porttitor. Integer eu tristique lectus. Vestibulum nisi risus, porta sit amet cursus nec, auctor ac tellus. Integer egestas viverra rhoncus. Fusce id sem non ante vestibulum posuere ac sed lorem. Proin id felis a mi pellentesque viverra in at nulla. Duis augue nulla, aliquet ac ligula a, sagittis varius lorem.

Content Column 2

Aliquam non ante et erat luctus hendrerit eu ac justo. Fusce lacinia pulvinar neque non laoreet. Fusce vitae mauris pharetra, scelerisque purus eget, pharetra nisl. Aenean volutpat elementum tortor vitae rhoncus. Phasellus nec tellus euismod neque congue imperdiet tincidunt in mauris. Morbi eu lorem molestie, hendrerit lorem nec, semper massa. Sed vulputate hendrerit ex, eget cursus purus. Pellentesque consequat erat leo, eleifend porttitor lacus porta at. Vivamus faucibus quam ipsum, id condimentum ligula malesuada ultrices. Nullam luctus leo elit, vitae rutrum nibh venenatis eget. Nam at sodales purus. Proin nulla tellus, lacinia eget pretium sed, vehicula aliquet neque. Morbi vel eros elementum, suscipit elit eu, consequat libero. Nulla nec aliquet neque. Nunc bibendum sapien lectus, sed elementum nisi rutrum non. Ut vulputate at lacus eget maximus.

Customizing your blog design

As a final detail only if you have time, you can explore options for customizing the style of your blog. By default, we are using the readthedown theme from the rmdformats package (see Line 6 of this file if you want to switch out themes).

Theme

You can use the rmdformats package to play around with some pre-built themes. There are, I’m sure, many many many more similar packages with built in themes, or you can look into how to include a CSS code chunk to customize aspects of a theme.

Using the rmdformats package, you can change the theme itself (Line 6):

  • rmdformats::readthedown
  • rmdformats::downcute
    • For downcute only, you can add a new indented line below Line 6 with the code downcute_theme: "chaos" for the downcute chaos theme
  • rmdformats::robobook
  • rmdformats::material

You can explore additional YAML options by looking at the rmdformats package page or running, for example, ?rmdformats::readthedown() to see the help documentation for a particular theme from the package.

Synax highlighting

You can also change the code chunk syntax highlighting option (Line 7, highlight):

  • "default"
  • "tango"
  • "pygments"
  • "kate"
  • "monochrome"
  • "espresso"
  • "zenburn"
  • "haddock"
  • "textmate"
  • NULL for no syntax highlighting (not recommended)

Font size, type, and other customization

Further customization requires adding a CSS style file or code chunk or incorporating other development options. Customization beyond the rmdformats package should be your lowest and final priority for the project. Ensure your content is fully prepared first.

References

All data sources, any key R packages, and any other sources used in developing your blog should be cited in full in a list of references at the end of your blog. Your blog post should also link to these sources as they are discussed. You may choose any reference style as long as sources are fully cited (try to be consistent!).

Typically, references in R Markdown (and LaTeX) files are incorporated with a BibTeX database (a .bib file). You can try this approach or manually include either a numbered or alphabetized list.

Columbia University has compiled some guidance on how to cite data. Some data sources will give you the citation information to copy and paste. Use the provided citations or citation styles in those cases.

You can list R package citations with the code citation("packageName") in the console and then copy (and reformat as needed) the relevant text, e.g.,

## 
## To cite package 'DT' in publications use:
## 
##   Xie Y, Cheng J, Tan X (2023). _DT: A Wrapper of the JavaScript
##   Library 'DataTables'_. R package version 0.27,
##   <https://CRAN.R-project.org/package=DT>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {DT: A Wrapper of the JavaScript Library 'DataTables'},
##     author = {Yihui Xie and Joe Cheng and Xianying Tan},
##     year = {2023},
##     note = {R package version 0.27},
##     url = {https://CRAN.R-project.org/package=DT},
##   }

The following citations are based on the American Statistical Association citation style (not all of these references are used in this document).:

Baumer, B. S., Kaplan, D. T., and Horton, N. J. (2021), Modern Data Science with R (2nd ed.), Boca Raton, FL: CRC Press.

Broman, K. W. and Woo, K. H. (2018), “Data Organization in Spreadsheets,” The American Statistician, 72:1, 2-10, doi: 10.1080/00031305.2017.1375989

Columbia University Libraries (n.d.), “Data Citation,” available at https://guides.library.columbia.edu/datacitation.

McNamara, A. and Horton N. J. (2018) “Wrangling Categorical Data in R,” The American Statistician, 72:1, 97-104, doi: 10.1080/00031305.2017.1356375.

Shah, Syed A. A. (October 2022), “Starbucks Drinks” (Version 1), Kaggle, available at https://www.kaggle.com/datasets/syedasimalishah/starbucks-drinks.

Xie Y, Cheng J, Tan X (2022). “DT: A Wrapper of the JavaScript Library ‘DataTables’,” R package version 0.24, available at https://CRAN.R-project.org/package=DT.

Breaking Barriers: Key Factors for Measuring Women’s Progress Across Countries

Why wont HDI suffice?

Amartya Sen, a Nobel laureate and renowned economist, once quoted, “empowering women is the key to building the future we want.” This simple yet powerful statement highlights the significance of gender equality and its impact on human development. The notion of human development is rooted in the idea of expanding people’s choices, enabling them to fulfill their potential, and giving them the freedom to lead lives they value. However, the reality is that women’s choices and freedoms are not equally and they continue to be marginalized across the globe.

While countries with higher HDI ranks are generally associated with greater levels of freedom and empowerment, the reality is more complex. For instance, a country’s overall HDI score may mask significant disparities in gender inequality within its population. In many countries, women continue to face discrimination in areas such as education, employment, and political representation, despite their nation’s high HDI ranking. Moreover, the cultural and social norms prevalent in a country can significantly impact the empowerment of women, even in countries with high HDI scores. Therefore, while HDI rankings can provide a broad measure of a country’s level of human development, it is crucial to examine specific indicators that measure the empowerment of women to gain a more nuanced understanding of gender inequality across the globe.

To gain a better understanding of the complex issues that women face worldwide, we will analyze standardized indicators of women’s empowerment across countries based on their population and HDI rank. This analysis will reveal the factors that contribute to gender inequality and highlight areas for improvement to advance women’s empowerment and create a more equitable society.

Indicators of Women’s Empowerment

We will look into four key indicators of Women’s Empowement measure

  1. Adolescent Birth Rate: This metric measures the number of births per 1,000 women between the ages of 15 and 19 in a given year. A high adolescent birth rate is often an indicator of poor sexual and reproductive health outcomes for young women, and can also be a barrier to educational and economic opportunities.

  2. Political Participation: This metric measures the extent to which women are involved in political decision-making processes, including representation in elected offices, participation in political parties, and involvement in civil society organizations. Women’s political participation is important for ensuring that their voices and perspectives are heard in policy-making processes.

  3. Labor Participation: This metric measures the percentage of women who are employed or seeking employment in the labor force. A low labor force participation rate can be an indicator of limited economic opportunities for women, which can in turn contribute to poverty and economic inequality.

  4. Secondary Level Education: The women’s indicator of secondary level education is a metric that measures the percentage of women in a given population who have completed secondary education. This indicator is often used as a measure of women’s educational attainment and their access to educational opportunities.

Heatmap: HDI, Polulation, Key indicators of Women’s Empowerment

This heatmap represents data on different indicators of women’s empowerment across 30 most popullous countries, ranked according to their HDI (highest HDI rank in the top and lowest in the bottom). Each row and column of the heatmap represents a different country and a specific indicator, respectively. The colors in the heatmap represent different values of each indicator, with the lighter shades indicating lower values and the darker shades indicating higher values. The values are standardized to make the intepretations easier. For Example, darker colors indicate that a particular country is doing better on that indicator compared to countries with lighter colors. Additionally, dendrograms are included at the top and left sides of the heatmap, which show how countries and indicators are clustered together based on similarities in their values.

Key Interpretations

  • The heatmap shows that there is generally a negative correlation between adolescent birthrate and HDI ranking, meaning that countries with higher development tend to have lower adolescent birthrates. However, the heatmap shows that it is not always the case. Some of the exceptions are Uganda and Nigeria, which have high adolescent birthrates despite their moderate development levels. Conversely, some countries with lower HDI rankings, such as India and Algeria, have lower adolescent birthrates

  • The heatmap reveals that women’s political participation is not consistently correlated with a country’s HDI ranking. Contrary to the common belief that higher HDI ranking equates to greater political participation for women, the data shows that this is not always the case. For instance, countries like Mexico, with a lower HDI ranking, exhibit higher women’s political participation compared to Japan, which has a higher HDI ranking but a lower participation rate. This suggests that factors other than development, such as cultural and social norms, may play a role in determining women’s political participation. Therefore, a more nuanced and context-specific approach is necessary to understand the complex interplay between development and women’s political participation.

  • The heatmap shows a strong positive correlation between women’s labor force participation and HDI ranking. Countries with higher HDI rankings tend to have higher labor force participation rates for women, while those with lower HDI rankings tend to have lower participation rates. However, it is worth noting that there are still significant disparities in women’s labor force participation rates within and across countries, even among those with high HDI rankings.

  • The education level column of the heatmap, specifically reflecting the metric of women’s completion of secondary education, shows a positive correlation with HDI. In general, countries with higher HDI rankings tend to have higher rates of women completing secondary education, indicating greater access to educational opportunities and greater potential for personal and professional growth. However, there are exceptions to this trend. One such exception is Ethiopia, which ranks relatively high in the HDI spectrum but has very low rates of women completing secondary education. This indicates that while Ethiopia has made progress in areas such as healthcare and income, it may face challenges in ensuring equal access to education for women.